Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: pmocek Date: Oct 4, 2013 9:10am
Forum: opensource Subject: Re: Derive failed on 450MB batch of PDFs

Jeff, can you recommend any documentation of best practices for publishing a large set like this to Internet Archive? I'm trying to help crowdsource review of these federal contracts with war/intel contractor Booz Allen Hamilton, and I was hoping to take advantage of your OCRing and torrenting.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffJeff Kaplan Date: Oct 4, 2013 10:08am
Forum: opensource Subject: Re: Derive failed on 450MB batch of PDFs

i'd suggest starting a new item page for the pdfs in each directory. identifiers like this would make sense to me:
2013FOIABoozAllenFedContracts-air_force
2013FOIABoozAllenFedContracts-dept_of_agriculture_ag_research_service
2013FOIABoozAllenFedContracts-dept_of_defense_education_activity
2013FOIABoozAllenFedContracts-dept_of_transportation
2013FOIABoozAllenFedContracts-federal_aviation_administration
2013FOIABoozAllenFedContracts-federal_energy_regulatory_commission
2013FOIABoozAllenFedContracts-food_and_drug_administration
2013FOIABoozAllenFedContracts-national_cancer_institute
2013FOIABoozAllenFedContracts-patent_and_trademark_office
2013FOIABoozAllenFedContracts-united_states_postal_service

use archive,org/upload with Chrome, Firfox or Safari and just drage the pdf files into the graybox on the start page. do not try to upload a directory. it won't work.

When you're done let me know and i can remove https://archive.org/details/2013FOIABoozAllenHamiltonFederalContracts

hpoe this helps.

Reply to this post
Reply [edit]

Poster: pmocek Date: Oct 4, 2013 1:22pm
Forum: opensource Subject: Re: Derive failed on 450MB batch of PDFs

Please delete item 2013FOIABoozAllenHamiltonFederalContracts (ark:/13960/t0dv35x3w).

I've created these, and they are deriving now:

* https://archive.org/details/2013FOIABoozAllenFedContracts-AirForce
* https://archive.org/details/2013FOIABoozAllenFedContracts-DeptOfAgricultureAgResearchService
* https://archive.org/details/2013FOIABoozAllenFedContracts-DeptOfDefenseEducationActivity
* https://archive.org/details/2013FOIABoozAllenFedContracts-DeptOfEnergy
* https://archive.org/details/2013FOIABoozAllenFedContracts-DeptOfTransportation
* https://archive.org/details/2013FOIABoozAllenFedContracts-FederalAviationAdministration
* https://archive.org/details/2013FOIABoozAllenFedContracts-FederalEnergyRegulatoryCommission
* https://archive.org/details/2013FOIABoozAllenFedContracts-FoodAndDrugAdministration
* https://archive.org/details/2013FOIABoozAllenFedContracts-NationalCancerInstitute
* https://archive.org/details/2013FOIABoozAllenFedContracts-PatentAndTrademarkOffice
* https://archive.org/details/2013FOIABoozAllenFedContracts-UnitedStatesPostalService

Is there some way to link them? I understand a collection requires at least 50 items, and it looks like "transclude other items" is not appropriate for this use.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffJeff Kaplan Date: Oct 4, 2013 1:57pm
Forum: opensource Subject: Re: Derive failed on 450MB batch of PDFs

until you have a collection i can only suggest putting in the description fields on each item links to all or a search query that brings them all up.

the item has been removed.

This post was modified by Jeff Kaplan on 2013-10-04 20:57:31